Skip to content

Conversation

@kazanzhy
Copy link
Contributor

Merge common code for SnowflakeHook, TrinoHook, and PrestoHook to

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so... what is the purpose of this method? what utility does it present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just trying to keep query_ids from the Snowflake.
It seems useless for Airflow itself, but Hook users can use it for tracking their queries

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so if we keep this, i'd prefer to make the interface on DBApi more generic, like _post_execute_hook. Or something like that. Maybe there's a better name out there. Then snowflake can implement it and use it to store the query ids.

One problem is that hooks do a variety of executing in a variety of contexts. So, perhaps we can come up with something that's a little more generic, i.e. compatible with if, say, we wanted to add a hook / callback in a different context within the functionality of the hook.

Meanwhile, I'm checking with @mobuchowski to see whether we still need query ids on snowflake hook for the original purpose (which was lineage) or whether we are now scraping that info some other way.

Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the record - see the k
ind of problems that the common.sql MIGHT introduce if we remove some seeminglly unused methods.

#27838

I think we should be very, very, very careful when we remove anything from common.sql provider. Because we might simply not realize that it has been used.

This is the effect of common code in full swing if you do not have very well defined "public API" upfront AND you do not check (Python WTF) that even if you declared something as protected, some other entity might rely on it.

This is the learnings we should have that once we make some code common and used elsewhere, refactoring and removing stuff from it should be done extremely carefully.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @potiuk i thought your position was "let's just break things" ;)

@mobuchowski it looks like you rely on such an attr on the operator class not the hook

is that correct? To me, that makes a lot more sense. Although, it looks like in main this may be broken, because I do not see anything that touches the query_ids attr on this operator after init.

Copy link
Contributor

@mobuchowski mobuchowski Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dstandish you're right, we're using it on operator - looks like it was broken on #25717.

It's not a critical for us - we can accept this breakage if we can agree on exposing this kind of information from hooks via operators in any other way. I think it would be useful for any other kinds of metadata.

For example, I dislike the fact that SQLExecuteQueryOperator gets hook in execute method, but does not set it as self instance variable.

There's a lot to be done on how to expose execution metadata.

@kazanzhy kazanzhy force-pushed the merge_sf_trino_presto_dbapihook branch from c4be9e9 to 9953a2d Compare November 18, 2022 22:28
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method still implemented here to keep Snowflake's split_statements: bool = True default parameter

@kazanzhy kazanzhy marked this pull request as ready for review November 18, 2022 22:31
@kazanzhy kazanzhy force-pushed the merge_sf_trino_presto_dbapihook branch from 9953a2d to e0bd57c Compare November 20, 2022 23:56
@potiuk
Copy link
Member

potiuk commented Nov 23, 2022

I think we need to approach any such changes very, very carefully after #27854 and #27838 - this is really manifestation of the fact that when code is closely coupled, separating common functionality to separate "component" is HARD and have to be very, carefully reviewed.

The scenarios where old providers use new common.sql have to be carefully considered - and this is what should happen before we merge this one. Possibly we should even wait until we add some automated tests to verify some of the faillure scenarios.

@kazanzhy kazanzhy force-pushed the merge_sf_trino_presto_dbapihook branch from e0bd57c to 284bdd4 Compare November 27, 2022 23:52
@kazanzhy
Copy link
Contributor Author

I totally agree with you.
I saw that some of my changes broke backward compatibility. I will try to be more careful with further changes.
Must say that the previous ones were too big and hard to review.

@potiuk
Copy link
Member

potiuk commented Nov 28, 2022

I totally agree with you. I saw that some of my changes broke backward compatibility. I will try to be more careful with further changes. Must say that the previous ones were too big and hard to review.

Yeah. More testing and the #27946 might be helpful :). I think that was good learning for all of us

@kazanzhy kazanzhy force-pushed the merge_sf_trino_presto_dbapihook branch from 284bdd4 to 59849fd Compare November 30, 2022 23:54
@potiuk
Copy link
Member

potiuk commented Dec 5, 2022

Needs conflict resolution/rebase now.

@eladkal
Copy link
Contributor

eladkal commented Dec 19, 2022

hi @kazanzhy do we still need this PR? if so please rebase

@potiuk
Copy link
Member

potiuk commented Jan 20, 2023

@kazanzhy ?

@github-actions
Copy link

github-actions bot commented Mar 7, 2023

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Mar 7, 2023
@github-actions github-actions bot closed this Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:common-sql provider:snowflake Issues related to Snowflake provider stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants